Tag

#attention mechanism

12 articles

Moonshot AI Releases Kimi K3: A 2.8 Trillion Parameter Open MoE Model With Kimi Delta Attention and 1M Context

Learn how Kimi K3, a new AI model from Moonshot AI, uses Mixture of Experts and advanced attention methods to become more efficient and powerful than previous models.

Jul 1625

Baidu's "Unlimited OCR" processes dozens of document pages in one pass by treating memory like human forgetting

Learn how Baidu's Unlimited OCR achieves efficient processing of dozens of document pages in a single pass by mimicking human memory and forgetting mechanisms.

Jul 541

A startup says it cracked the maths bottleneck holding back AI. It finally has the receipts.

A Miami startup claims to have cracked a key mathematical bottleneck in AI, making large language models faster and more energy-efficient. Independent tests back up their bold claims.

Jun 1944

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

Learn to implement a simplified version of MiniMax's Sparse Attention mechanism that reduces computational complexity in attention operations while maintaining performance.

Jun 1644

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

Learn how Parallax, a new AI technique, improves language understanding by combining traditional attention methods with smart error corrections.

May 3159

So you’ve heard these AI terms and nodded along; let’s fix that

Learn to implement and experiment with fundamental AI concepts including neural networks, transformers, and attention mechanisms through hands-on coding exercises.

May 2942

NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

NVIDIA's Gated DeltaNet-2 decouples erase and write operations in linear attention, outperforming models like Mamba-2 and KDA in long-context tasks.

May 2369

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

Learn how Lighthouse Attention speeds up AI training on long inputs by selectively focusing on important information, without sacrificing accuracy.

May 1650

Moonshot AI Open-Sources FlashKDA: CUTLASS Kernels for Kimi Delta Attention with Variable-Length Batching and H20 Benchmarks

Learn how to set up and use FlashKDA, an open-source high-performance implementation of Kimi Delta Attention from Moonshot AI, for accelerating attention computation in large language models.

Apr 3059

Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost

This article explains how Xiaomi's MiMo-V2.5 models achieve frontier-level AI performance with significantly lower token costs, focusing on agentic AI, token efficiency, and advanced optimization techniques.

Apr 2281

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

Learn how TriAttention, a new AI method, compresses memory in large language models to make them 2.5x faster without losing accuracy.

Apr 1157

Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs

Learn to build a hybrid neural network architecture that combines attention mechanisms with convolutional layers, similar to Liquid AI's LFM2-24B-A2B model, to address scaling bottlenecks in large language models.

Feb 25139